158 research outputs found

    Improved indel detection in DNA and RNA via realignment with ABRA2

    Get PDF
    Motivation: Genomic variant detection from next-generation sequencing has become established as an extremely important component of research and clinical diagnoses in both cancer and Mendelian disorders. Insertions and deletions (indels) are a common source of variation and can frequently impact functionality, thus making their detection vitally important. While substantial effort has gone into detecting indels from DNA, there is still opportunity for improvement. Further, detection of indels from RNA-Seq data has largely been an afterthought and offers another critical area for variant detection. Results: We present here ABRA2, a redesign of the original ABRA implementation that offers support for realignment of both RNA and DNA short reads. The process results in improved accuracy and scalability including support for human whole genomes. Results demonstrate substantial improvement in indel detection for a variety of data types, including those that were not previously supported by ABRA. Further, ABRA2 results in broad improvements to variant calling accuracy across a wide range of post-processing workflows including whole genomes, targeted exomes and transcriptome sequencing

    Genetic determinants of the molecular portraits of epithelial cancers

    Get PDF
    The ability to characterize and predict tumor phenotypes is crucial to precision medicine. In this study, we present an integrative computational approach using a genome-wide association analysis and an Elastic Net prediction method to analyze the relationship between DNA copy number alterations and an archive of gene expression signatures. Across breast cancers, we are able to quantitatively predict many gene signatures levels within individual tumors with high accuracy based upon DNA copy number features alone, including proliferation status and Estrogen-signaling pathway activity. We can also predict many other key phenotypes, including intrinsic molecular subtypes, estrogen receptor status, and TP53 mutation. This approach is also applied to TCGA Pan-Cancer, which identify repeatedly predictable signatures across tumor types including immune features in lung squamous and basal-like breast cancers. These Elastic Net DNA predictors could also be called from DNA-based gene panels, thus facilitating their use as biomarkers to guide therapeutic decision making

    Amplification of SOX4 promotes PI3K/Akt signaling in human breast cancer

    Get PDF
    Purpose: The PI3K/Akt signaling axis contributes to the dysregulation of many dominant features in breast cancer including cell proliferation, survival, metabolism, motility, and genomic instability. While multiple studies have demonstrated that basal-like or triple-negative breast tumors have uniformly high PI3K/Akt activity, genomic alterations that mediate dysregulation of this pathway in this subset of highly aggressive breast tumors remain to be determined. Methods: In this study, we present an integrated genomic analysis based on the use of a PI3K gene expression signature as a framework to analyze orthogonal genomic data from human breast tumors, including RNA expression, DNA copy number alterations, and protein expression. In combination with data from a genome-wide RNA-mediated interference screen in human breast cancer cell lines, we identified essential genetic drivers of PI3K/Akt signaling. Results: Our in silico analyses identified SOX4 amplification as a novel modulator of PI3K/Akt signaling in breast cancers and in vitro studies confirmed its role in regulating Akt phosphorylation. Conclusions: Taken together, these data establish a role for SOX4-mediated PI3K/Akt signaling in breast cancer and suggest that SOX4 may represent a novel therapeutic target and/or biomarker for current PI3K family therapies

    A pan-cancer analysis of the frequency of DNA alterations across cell cycle activity levels

    Get PDF
    Pan-cancer genomic analyses based on the magnitude of pathway activity are currently lacking. Focusing on the cell cycle, we examined the DNA mutations and chromosome arm-level aneuploidy within tumours with low, intermediate and high cell-cycle activity in 9515 pan-cancer patients with 32 different tumour types. Boxplots showed that cell-cycle activity varied broadly across and within all cancers. TP53 and PIK3CA mutations were common in all cell cycle score (CCS) tertiles but with increasing frequency as cell-cycle activity levels increased (P < 0.001). Mutations in BRAF and gains in 16p were less frequent in CCS High tumours (P < 0.001). In Kaplan–Meier analysis, patients whose tumours were CCS Low had a longer Progression Free Interval (PFI) relative to Intermediate or High (P < 0.001) and this significance remained in multivariable analysis (CCS Intermediate: HR = 1.37; 95% CI 1.17–1.60, CCS High: 1.54; 1.29–1.84, CCS Low = Ref). These results demonstrate that whilst similar DNA alterations can be found at all cell-cycle activity levels, some notable exceptions exist. Moreover, independent prognostic information can be derived on a pan-cancer level from a simple measure of cell-cycle activity

    Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V'DJer

    Get PDF
    Motivation: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is frequently not feasible due to cost or limited sample material. Additionally, there are many existing datasets where short-read RNA sequencing data are available but PCR amplified BCR data are not. Results: We present here V'DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V'DJer's ability to accurately reconstruct BCR repertoires from short read mRNA-seq data

    Virus expression detection reveals RNA-sequencing contamination in TCGA

    Get PDF
    Background: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. Results: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the "common reference", which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the "common reference". One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. Conclusions: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV

    Separation of breast cancer and organ microenvironment transcriptomes in metastases

    Get PDF
    Background: The seed and soil hypothesis was proposed over a century ago to describe why cancer cells (seeds) grow in certain organs (soil). Since then, the genetic properties that define the cancer cells have been heavily investigated; however, genomic mediators within the organ microenvironment that mediate successful metastatic growth are less understood. These studies sought to identify cancer- and organ-specific genomic programs that mediate metastasis. Methods: In these studies, a set of 14 human breast cancer patient-derived xenograft (PDX) metastasis models was developed and then tested for metastatic tropism with two approaches: spontaneous metastases from mammary tumors and intravenous injection of PDX cells. The transcriptomes of the cancer cells when growing as tumors or metastases were separated from the transcriptomes of the microenvironment via species-specific separation of the genomes. Drug treatment of PDX spheroids was performed to determine if genes activated in metastases may identify targetable mediators of viability. Results: The experimental approaches that generated metastases in PDX models were identified. RNA sequencing of 134 tumors, metastases, and normal non-metastatic organs identified cancer- and organ-specific genomic properties that mediated metastasis. A common genomic response of the liver microenvironment was found to occur in reaction to the invading PDX cells. Genes within the cancer cells were found to be either transiently regulated by the microenvironment or permanently altered due to clonal selection of metastatic sublines. Gene Set Enrichment Analyses identified more than 400 gene signatures that were commonly activated in metastases across basal-like PDXs. A Src signaling signature was found to be extensively upregulated in metastases, and Src inhibitors were found to be cytotoxic to PDX spheroids. Conclusions: These studies identified that during the growth of breast cancer metastases, there were genomic changes that occurred within both the cancer cells and the organ microenvironment. We hypothesize that pathways upregulated in metastases are mediators of viability and that simultaneously targeting changes within different cancer cell pathways and/or different tissue compartments may be needed for inhibition of disease progression

    Differences in race, molecular and tumor characteristics among women diagnosed with invasive ductal and lobular breast carcinomas

    Get PDF
    Background: The dominant invasive breast cancer histologic subtype, ductal carcinoma, shows intrinsic subtype diversity. However, lobular breast cancers are predominantly Luminal A. Both histologic subtypes show distinct relationships with patient and tumor characteristics, but it is unclear if these associations remain after accounting for intrinsic subtype. Methods: Generalized linear models were used to estimate relative frequency differences (RFDs) and 95% confidence intervals (95% CIs) for the associations between age, race, tumor characteristics, immunohistochemistry (IHC) and RNA-based intrinsic subtype, TP53 status, and histologic subtype in the Carolina Breast Cancer Study (CBCS, n = 3,182) and The Cancer Genome Atlas (TCGA, n = 808). Results: Relative to ductal tumors, lobular tumors were significantly more likely to be Luminal A [CBCS RNA RFD: 44.9%, 95% CI (39.6, 50.1); TCGA: RFD: 50.5%, 95% CI (43.9, 57.1)], were less frequent among young (≤ 50 years) and black women, were larger in size, low grade, less frequently had TP53 pathway defects, and were diagnosed at later stages. These associations persisted among Luminal A tumors (n = 242). Conclusions: While histology is strongly associated with molecular characteristics, histologic associations with age, race, size, grade, and stage persisted after restricting to Luminal A subtype. Histology may continue to be clinically relevant among Luminal A breast cancers

    The Iterative Signature Algorithm for the analysis of large scale gene expression data

    Full text link
    We present a new approach for the analysis of genome-wide expression data. Our method is designed to overcome the limitations of traditional techniques, when applied to large-scale data. Rather than alloting each gene to a single cluster, we assign both genes and conditions to context-dependent and potentially overlapping transcription modules. We provide a rigorous definition of a transcription module as the object to be retrieved from the expression data. An efficient algorithm, that searches for the modules encoded in the data by iteratively refining sets of genes and conditions until they match this definition, is established. Each iteration involves a linear map, induced by the normalized expression matrix, followed by the application of a threshold function. We argue that our method is in fact a generalization of Singular Value Decomposition, which corresponds to the special case where no threshold is applied. We show analytically that for noisy expression data our approach leads to better classification due to the implementation of the threshold. This result is confirmed by numerical analyses based on in-silico expression data. We discuss briefly results obtained by applying our algorithm to expression data from the yeast S. cerevisiae.Comment: Latex, 36 pages, 8 figure

    Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype

    Get PDF
    RNA-based, multi-gene molecular assays are available and widely used for patients with ER-positive/HER2-negative breast cancers. However, RNA-based genomic tests can be costly and are not available in many countries. Methods for inferring molecular subtype from histologic images may identify patients most likely to benefit from further genomic testing. To identify patients who could benefit from molecular testing based on H&E stained histologic images, we developed an image analysis approach using deep learning. A training set of 571 breast tumors was used to create image-based classifiers for tumor grade, ER status, PAM50 intrinsic subtype, histologic subtype, and risk of recurrence score (ROR-PT). The resulting classifiers were applied to an independent test set (n = 288), and accuracy, sensitivity, and specificity of each was assessed on the test set. Histologic image analysis with deep learning distinguished low-intermediate vs. high tumor grade (82% accuracy), ER status (84% accuracy), Basal-like vs. non-Basal-like (77% accuracy), Ductal vs. Lobular (94% accuracy), and high vs. low-medium ROR-PT score (75% accuracy). Sampling considerations in the training set minimized bias in the test set. Incorrect classification of ER status was significantly more common for Luminal B tumors. These data provide proof of principle that molecular marker status, including a critical clinical biomarker (i.e., ER status), can be predicted with accuracy >75% based on H&E features. Image-based methods could be promising for identifying patients with a greater need for further genomic testing, or in place of classically scored variables typically accomplished using human-based scoring
    • …
    corecore